The current models of image representation based on Convolutional NeuralNetworks (CNN) have shown tremendous performance in image retrieval. Suchmodels are inspired by the information flow along the visual pathway in thehuman visual cortex. We propose that in the field of particular objectretrieval, the process of extracting CNN representations from query images witha given region of interest (ROI) can also be modelled by taking inspirationfrom human vision. Particularly, we show that by making the CNN pay attentionon the ROI while extracting query image representation leads to significantimprovement over the baseline methods on challenging Oxford5k and Paris6kdatasets. Furthermore, we propose an extension to a recently introducedencoding method for CNN representations, regional maximum activations ofconvolutions (R-MAC). The proposed extension weights the regionalrepresentations using a novel saliency measure prior to aggregation. This leadsto further improvement in retrieval accuracy.
展开▼